501 research outputs found
Theoretical Properties of the Overlapping Groups Lasso
We present two sets of theoretical results on the grouped lasso with overlap
of Jacob, Obozinski and Vert (2009) in the linear regression setting. This
method allows for joint selection of predictors in sparse regression, allowing
for complex structured sparsity over the predictors encoded as a set of groups.
This flexible framework suggests that arbitrarily complex structures can be
encoded with an intricate set of groups. Our results show that this strategy
results in unexpected theoretical consequences for the procedure. In
particular, we give two sets of results: (1) finite sample bounds on prediction
and estimation, and (2) asymptotic distribution and selection. Both sets of
results give insight into the consequences of choosing an increasingly complex
set of groups for the procedure, as well as what happens when the set of groups
cannot recover the true sparsity pattern. Additionally, these results
demonstrate the differences and similarities between the the grouped lasso
procedure with and without overlapping groups. Our analysis shows the set of
groups must be chosen with caution - an overly complex set of groups will
damage the analysis.Comment: 20 pages, submitted to Annals of Statistic
Entropy balancing is doubly robust
Covariate balance is a conventional key diagnostic for methods used
estimating causal effects from observational studies. Recently, there is an
emerging interest in directly incorporating covariate balance in the
estimation. We study a recently proposed entropy maximization method called
Entropy Balancing (EB), which exactly matches the covariate moments for the
different experimental groups in its optimization problem. We show EB is doubly
robust with respect to linear outcome regression and logistic propensity score
regression, and it reaches the asymptotic semiparametric variance bound when
both regressions are correctly specified. This is surprising to us because
there is no attempt to model the outcome or the treatment assignment in the
original proposal of EB. Our theoretical results and simulations suggest that
EB is a very appealing alternative to the conventional weighting estimators
that estimate the propensity score by maximum likelihood.Comment: 23 pages, 6 figures, Journal of Causal Inference 201
Structured, sparse regression with application to HIV drug resistance
We introduce a new version of forward stepwise regression. Our modification
finds solutions to regression problems where the selected predictors appear in
a structured pattern, with respect to a predefined distance measure over the
candidate predictors. Our method is motivated by the problem of predicting
HIV-1 drug resistance from protein sequences. We find that our method improves
the interpretability of drug resistance while producing comparable predictive
accuracy to standard methods. We also demonstrate our method in a simulation
study and present some theoretical results and connections.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS428 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: measuring structure growth using passive galaxies
We explore the benefits of using a passively evolving population of galaxies
to measure the evolution of the rate of structure growth between z=0.25 and
z=0.65 by combining data from the SDSS-I/II and SDSS-III surveys. The
large-scale linear bias of a population of dynamically passive galaxies, which
we select from both surveys, is easily modeled. Knowing the bias evolution
breaks degeneracies inherent to other methodologies, and decreases the
uncertainty in measurements of the rate of structure growth and the
normalization of the galaxy power-spectrum by up to a factor of two. If we
translate our measurements into a constraint on sigma_8(z=0) assuming a
concordance cosmological model and General Relativity (GR), we find that using
a bias model improves our uncertainty by a factor of nearly 1.5. Our results
are consistent with a flat Lambda Cold Dark Matter model and with GR.Comment: Accepted for publication in MNRAS (clarifications added, results and
conclusions unchanged
Detection of Baryon Acoustic Oscillation Features in the Large-Scale 3-Point Correlation Function of SDSS BOSS DR12 CMASS Galaxies
We present the large-scale 3-point correlation function (3PCF) of the SDSS
DR12 CMASS sample of Luminous Red Galaxies, the largest-ever sample
used for a 3PCF or bispectrum measurement. We make the first high-significance
() detection of Baryon Acoustic Oscillations (BAO) in the 3PCF.
Using these acoustic features in the 3PCF as a standard ruler, we measure the
distance to to precision (statistical plus systematic). We
find for our
fiducial cosmology (consistent with Planck 2015) and bias model. This
measurement extends the use of the BAO technique from the 2-point correlation
function (2PCF) and power spectrum to the 3PCF and opens an avenue for deriving
additional cosmological distance information from future large-scale structure
redshift surveys such as DESI. Our measured distance scale from the 3PCF is
fairly independent from that derived from the pre-reconstruction 2PCF and is
equivalent to increasing the length of BOSS by roughly 10\%; reconstruction
appears to lower the independence of the distance measurements. Fitting a model
including tidal tensor bias yields a moderate significance (
detection of this bias with a value in agreement with the prediction from local
Lagrangian biasing.Comment: 15 pages, 7 figures, submitted MNRA
Baryon Acoustic Oscillations in the Sloan Digital Sky Survey Data Release 7 Galaxy Sample
The spectroscopic Sloan Digital Sky Survey (SDSS) Data Release 7 (DR7) galaxy
sample represents the final set of galaxies observed using the original SDSS
target selection criteria. We analyse the clustering of galaxies within this
sample, including both the Luminous Red Galaxy (LRG) and Main samples, and also
include the 2-degree Field Galaxy Redshift Survey (2dFGRS) data. Baryon
Acoustic Oscillations are observed in power spectra measured for different
slices in redshift; this allows us to constrain the distance--redshift relation
at multiple epochs. We achieve a distance measure at redshift z=0.275, of
r_s(z_d)/D_V(0.275)=0.1390+/-0.0037 (2.7% accuracy), where r_s(z_d) is the
comoving sound horizon at the baryon drag epoch,
D_V(z)=[(1+z)^2D_A^2cz/H(z)]^(1/3), D_A(z) is the angular diameter distance and
H(z) is the Hubble parameter. We find an almost independent constraint on the
ratio of distances D_V(0.35)/D_V(0.2)=1.736+/-0.065, which is consistent at the
1.1sigma level with the best fit Lambda-CDM model obtained when combining our
z=0.275 distance constraint with the WMAP 5-year data. The offset is similar to
that found in previous analyses of the SDSS DR5 sample, but the discrepancy is
now of lower significance, a change caused by a revised error analysis and a
change in the methodology adopted, as well as the addition of more data. Using
WMAP5 constraints on Omega_bh^2 and Omega_ch^2, and combining our BAO distance
measurements with those from the Union Supernova sample, places a tight
constraint on Omega_m=0.286+/-0.018 and H_0 = 68.2+/-2.2km/s/Mpc that is robust
to allowing curvature and non-Lambda dark energy. This result is independent of
the behaviour of dark energy at redshifts greater than those probed by the BAO
and supernova measurements. (abridged)Comment: 22 pages, 16 figures, minor changes to match version published in
MNRA
- …